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(54) Data processing system with digital signal processor core and co-processor 



(57) A data processing system includes a digital sig- 
nal processor core (110) and a co-processor (140). The 
co-processor (140) has a local memory (141, 145, 147) 
within the address space ot the said digital signal proc- 
essor core (110). The coprocessor (140) responds 
commands from the digital signal processor core (110). 
A direct memory access circuit (120) autonomously 
transfers data to and from the local memory (141 , 145, 
147) of the co-processor (140). Co-processor com- 
mands are stored in a command FIFO memory (141) 



mapped to a predetermined memory address. Control 
commands includes a receive data synchronism com- 
mand stalling the co-processor (140) until completion of 
a memory transfer into the local memory (141, 145, 
147). A send data synchronism command causes the 
co-processor (1 40) to signal the direct memory access 
circuit (1 20) to trigger memory transfer out of the local 
memory (141. 145, 147). An interrupt command causes 
the co-processor (140) to interrupt the digital' signal 
processor core (110). 
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Description 

TECHNICAL FIELD OF THE INVENTION 

[0001] The present invention relates generally to the 
fields of digital signal processing, and more particularly 
to a digital signal processor with a core data processor 
and a reconfigufable co-processor. 

BACKGROUND OF THE INVENTION 

[0002] Digital signal processing is becoming more 
and more common for audio and video processing. In 
many instances a single digital processor can replace a 
host of prior discrete analog components; The increase 
in processing capacity aflorded by digital signal proces- 
sors had enabled more types ol devices and more func- 
tions for prior devices. This process has created the ap- 
petite for more complex functions and features on cur- 
rent devices and new types ol devices. In some cases 
this appetite has outstripped the ability to cost effectively 
deliver the desired functionality with full programmable 
digital signal processors. 

[0003] One response to this need is to couple a digital 
signal processor with an application specific integrated 
circuit (ASIC). THe digital signal processor is pro- 
grammed to handle control functions and some signal 
processing. The full programmability of the digital signal 
processor enables product differentiation through differ- 
ent programming. The ASIC is constructed to provide 
processing hardware for certain core functions that are 
commonly performed and time critical. With the increas- 
ing density.of integrated circuits it is now becoming pos- 
sible to place a digital signal processor and an ASIC 
hardware co-processor on the same chip. 
[0004] This approach has two problems. This ap- 
proach rarely results in an efficient connection between 
the hardware co-processor ASIC and the digital signal 
processor. It is typical to handle most of the interface by 
programming the digital signal processor. In many cas- 
es the digital signal processor must supply data pointers 
and commands in real time as the hardware co-proces- 
sor is operating. To form safe designs, it is typical to pro- 
vide extra time for the digital signal processor to service 
the hardware co-processor. This means that the hard- 
ware co-processor is not fully used. A second problem 
: comes from the time to design problem. With the in- 
creasing capability to design differing functionality, the 
product cycies have been reduced. This puts a premium 
on designing new functions quickly. The ability to reuse 
programs and interfaces would aid in shortening design 
cycles. However, the fixed functions implemented in the 
ASIC hardware co-processor cannot easily be reused. 
The typical ASIC hardware co-processor has a limited 
set of functions suitable for a narrow range of problems. 
These designs cannot be quickly reused even toimple- 
- ment closely related functions. In addition the interface 
between the digital signal processor and the ASIC hard- 



ware co-processor tends to use ad hoc techniques that 
are specific to a particular product. . . 

SUMMARY OF THE INVENTION ' . 

5 

[0005] This invention is a data processing system in- 
cluding a digital signal processor core and a co-proces- 
sor. The co-processor has a local memory within the ad- 
dress space of the said digital signal processor core. 

10 The co-processor. is responsive to commands from the 
digital signal processor core to perform predetermined 
data processing operations on data stored in said local 
memory in parallel with digital signal processor core. 
The data processing system includes a direct memory 

is access circuit under the control ot the digital signal proc- 
essor core. The direct memory access circuit autono- 
mously transfers data to and from the local memory of 
the co-processor. 

[0006] The coprocessor responds to commands to 

20 configure ilsell correspondingly to perform a set of re- 
lated data processing operation. Co-processor com- 
mands are stored in a command first in first out memory. 
The command RFO memory has an mapped to a pre- 
determined memory address. 

2$ [0007] Tne co-processor is responsive to various con- 
trol commands. A receive data synchronism command 
pauses processing commands until the direct memory 
access circuit signals completion of a memory transfer 
into the Jccal memory. A send "data synchronism conv 

30 mahd causes the co-processor to signal the i direct mem- 
ory access. circuit to trigger a predetermined memory 
V transfir pu^of tholocal memory. An interrupt command 
causes the coprocessor to interrupt the digital signal 
processor core. 

35 [0008] Each command includes an indication of a da- 
ta input location within the local memory. The co-proc- 
essor recalls data from local memory starting with the 
indicated data input location. Each command includes 
an indication of a data output location within the local 

40 memory. The co-processor stores resultant data local 
memory starting with the, indicated data input location. 
The input data may be stored in a circularly organized 
memory area serving as an input buffer. The resultant 
data may be stored in a circularly organized memory ar- 

45 ea serving as an output buffer. , 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0009] The present invention will now be further de- 
50 scribed by. way of example, with reference to the exem- 
plary embodiments illustrated in, the accompanying 
drawings, in which: 

.Figure 1 illustrates the combination of a digital sig- 
55 nal processor core ancl a reconfigurable hardware 
pq-processor in accordance with this invention; 
Figure 2 illustrates the memory map logical cou- 
pling between the digital signal processor core and 
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thereconfigurable hardware co-processor of this in- 
vention; 

Figure 3 illustrates a manner ol using the reconfig- 
urable hardware co-processor memory; 
Figure 4 illustrates a memory management tech- 
nique useful for filter algorithms; 
Figure 5 illustrates an alternative embodiment of the 
combination of Figure 1 including two co-proces- 
sors with a private bus between; 
Figure 6 illustrates the construction, of a hardware 
coprocessor which is reconfigurable to perform a 
variety of filter functions; 

Figure 7 iBustrates the jnput formatter of th e recon- 
figurable hardware co-processor illustrated in Fig- 
ure 6; 

Figure 8 illustrates the reconfigurable data path 
core of the reconfigurable hardware co-processor 
illustrated in Figure 6; 

Figure 9 illustrates the output formatter of the recon- 
figurable hardware co-processor illustrated in Fig- 

Figure 10 illustrates the. data flow connections 
'through the data path core for performing a real fi- 
nite impulse response fitter; 
Figure 11' illustrates the data flow connections 
through the data path core for performing a complex 
finite impulse response filter/ c . * 

Figure 12 illustrates the data-flow connection 
through the' data path core for ^performing a coeffi- 
cient update function; and 
Figure' 1 3 jllustrates jhe ; &te Jflow c 
through the* data path core for performing fast Fou- 
rier transform ' " -• - ' ■ " ''^ 

* r DETAILED DESCRIPTION OF P*REFERRED 
: " EMBODIMENTS 

[0010] ' Figure 1 illustrates circuit tod including a dig- 
ital signal processor cbrel 10 and a reconfigurable hard- 
ware co-processor 140. In accordance with the pre- 
ferred embodiment of this invention, these parts are 
formed in a single integrated circuit. Digital signal proc- 
essor core 110 may be ofcbriventional design. In the 
preferred embodiment digital signal processor core 110 
is adapted to control direct memory access circuit 120 
for autonomous data transfers Independent of digital 
signal processor core 110. External memory interface 
130 serves to interface the internal data bus 101 and 
address bus 1 03 to their external counterparts external 
data bus 131 and external address bus 1 33, respective- 
ly. External memory interface 1 30 is conventional in con- 
struction. Integrated circuit 100 may optionally include 
additional conventional features and circuits. Note par- 
ticularly that the addition of cache memory to integrated 
circuit '100 could substantially improve performance. 
The parts illustrated in Figure 1' are not intended to ex- 
clude the provision of other conventional parts. Those 
conventional parts illustrated in Figure 1 are merely the 
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parts most effected by the addition of reconfigurable 
hardware co-processor 140. 

[0011] Reconfigurable hardware co-processor 1 40 is 
coupled to other parts of integrated circuit 100 via data 

s bus 1 01 and address bus 1 03. Reconfigurable hardware 
co-processor 140 includes command memory 141, co- 
r processor logic core 143, data.memory 145 and coeffi- 
cient memory 147. Command memory 141 serves as 
the conduit by which digital signal processor core 110 

io controls the operations of reconfigurable hardware co- 
processor 1 40. This feature will be further illustrated irr 
Figure 2. Co-processor logic core 143 is responsive to 
commands stored in command memory 141 to perform 
co-processing functions.. These co-processing tunc- 

*s lions involve exchange of data between co-processor 
logic core 143 and data memory 145 and coefficient 
memory 147. Data memory 145 stores the input data 
processed by reconfigurable hardware coprocessor 

140 and further,stores the resultant of the operations of 
20 reconfigurable hardware co-processor 140. The man- 
ner of storing this. data wijl be further described below 
with respect to Figure 2." Coefficient memory 147 stores 
v the unchanging or relatively unchanging process pa- 
rameters called coefficients used by co-processor logic 

2S core 143. Though data memory , 145 and coefficient 
memory 147 have been illustrated as separate parte, it 
would be easy to empby these merely as different por- 
tions of a single, unified memory. As will be shown be- 
jow, for the multiple multiply accumulate co-processor 

30 described below it is best if such a single unified memory 
have two read ports for data and coefficients and two 
write ports for writing the output data. It is believed best 
that the memory accessible by reconfigurable hardware 
co-processor 1 40 be located on the same integrated cir- 

35 curt in physical proximity to co-processor logic core 1 43. 
This physical closeness is needed to accommodate the 
wide memory buses required by the desired data 
throughput of co-processor logic core 143. , 
[0012] Figure 2 illustrates the memory mapped inter- 
{ 4<? face between digital signal processor- core 110 and 
reconfigurable hardware coprocessor 140. Digital sig- 
nal processor core 110 controls reconfigurable hard- 
ware co-processor 140 via. command memory 141. In 
the preferred embodiment command memory 141 is a 

45 first-in-first-out (FIFO) memory. The write port ot com- 
mand memory 141 is memory mapped Into a single 
memory location within the address space of digital sig- 
nal processor core 1 10. Thus digital signal processor 
core 110 controls reconfigurable hardware co-proces- 

so sor 1 40 by writing commands to the address serving as 
the input to command memory 141 Command memory 

141 preferably includes two circularly oriented pointers. 
' The write pointer 151 points to the location within com- 
mand memory 141 where the next received command 

ss 'is to be stored. Each time there is a write to the prede- 
termined address ol command memory 141 , write point- 
er selects the physical location receiving the data. Fol- 
lowing such a data write, write pointer 151 i s updated to 1 
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point to the next physical location within command 
memory 141. Write pointer 151 is circularly oriented in 
that it wraps around from the last physical location to the 
first physical location!- Reconfigurable hardware co- 
processor 140 reads commands from command mem- 
ory 141 in the same order as they are received (FIFO) 
using read pointer 153. Read pointer 153 points to the 
physical location within command memory 141 storing 
the next command to be read Read pointer 1 53 isjJfi^ 
jjjajted to reference the next physical location within com- 
mand memory 141 following each such read. Note that 
read pointer 153 is also circularly oriented and wraps 
around from the last physical location tothe first physical 
location. Command memory 141 includes a feature pre- 
. venting write pointer 151 from passing read pointer 153. 
This may take place, for example, by refusing to write 
and sending a memory fault signal back to digital signal 
processor core 110 when write pointer 151 and read 
pointer 153 reference the same physical location. Thus 
the FIFO buffer ol command memory 141 can be full 
and not accept additional commands. 
[0013] Data memory 145 and coefficient memory 147 
are both mapped within the data address space of digital 
signal processor core 1 1 0. As illustrated in Figure 2. da- 
ta bus 101 is bidirectionally coupled to memory 149. In 
accordance with the alternative embodiment noted 
above, both data memory 1 45 and coefficient memory 
147 are formed as a part of memory 147. Memory 147 
* . is also accessible by co-processor logic core 143 (not 
illustrated in Figure 2). Figure 2 illustrates three circum- 
scribed areas of memory within memory 149. As will be 
• further described below, reconfigurable hardware co- 
- : processor 140 preferably performs several functions 
employing differing memory areas. Note that due to the 
[0014] Integrated circuit 100 operates as follows. Dig- 
ital signal processor core 110 controls the data and co- 
efficients used by reconfigurable hardware co-proces- 
sor 140 by loading the data into data memory 145 and 
the coefficients into coefficient memory 1 47. Alternative- 
ly, digital signal processor core 110 loads the data and 
coefficients into the unified memory 149. Digital signal 
processor core 1 10 may be programmed to perform this 
data transfer directly. Digital signal processor core 110 
- c may alternatively be programmed to control direct mem- 
ory access circuit 1 20 to perform this data transfer. Par- 
ticularly for audio or video processing applications, the 
data stream is received at a' predictable rate and from a 
predictable input device. Thus it would typically be effi- 
cient for digital signal processor core 11 6' to control di- 
rect memory access circuit 120 to make transfers from 
external memory to memory accessible by reconfigura- 
ble hardware co-processor 140. 
[0016] Following the transfer of data to be processed, 
digital signal processor core 110 signals reconfigurable 
hardware co-processor 140 with the command for the 
desired signal processing algorithm. As previously stat- 
ed, commands are sent to reconfigurable hardware co- 
processor 140 by a memory write to a predetermined 



address . R eceived commands are stored in. command i ( 
memory 141 on a first-in-first-out basis. ' 
[0016] Each computational command of reconfigura- 
ble hardware co-processor 140 preferably includes a 

5 manner to specify the particular function to be per- 
formed. In the preferred embodiment, reconfigurable 
hardware co-processor 140 is constructed to be recon- 
figurable. Reconfigurable hardware co-processor 140 
has a set of functional units, such' as multipliers and 

io adders, that can be connected together in differing ways 
to perform different but related functions. The set of re- 
lated functions selected for each reconfigurable hard- 
ware co-processor will be based upon a similarity of the 
mathematics of the : functions. This similarity in mathe- 

15 matics enables similar hardware to be reconfigured for 
the plural functions. The command may indicate the par- 
ticular computation via an opcode in the manner of data 
processor instructions. 

[0017] Each computational command includes a 
20 manner of specifying the location of the data to be used 
by the computation. There are many suitable methods 
of designatingHhe data space. For example, the com- 
mand may specify a starting address and number of da- 
ta worcte or samples within the block. The data size may 
25 be specified as a parameter or it may be specified by 
the opcode defining the computation type As a further 
example, the command may specify the data size, the 
starting address and the ending address of the input da- 
ta. Note that known indirect methods of specifying 

30 where the input data is stored may be used. The com- 
mand may include a pointer to a register or a memory 
1 ' locati6n storing any of these parameters such as start 
-address, data size/riumber of samples within the data 
block and encr^a'ddress: 

35 [0018] Each computational command must further in- 
dicate the memory address range storing the data, for 
the particular command. This indication may be made 
byany of the methods listed above with regard to the 
locations storing the input data. In many cases the com- 

40 putational function will be a filter function and the 
amount of output data following processing will be about 
equivalent to the amount of input data. In other cases, 
the amount of output data may be more or less than the 
amount of input data. In any event, the amount of result- 

45 ant data is known from the amount of input data and the 
type of computational function requested. Thus merely 
specifying the starting address provides sufficient inf or- 
"rnation to indicate where ail the resultant data is to be 
stored. It is feasible to store the output data in a destruc- 

so tive manner over-writing input data during processing. 
Alternatively, the output data may be written to a differ- 
ent portion of memory and the input data preserved at 
least temporarily. The*selection between these alterna- 
tives "may depend upon whether the input data will be 
{ 55 reused." ' 

[0019] Figure 3 illustrates one useful technique in- 
volving alternatively employing two memory areas. One 
memory area 144 stores the input data needed for the 
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co-processor function. The relatively constant coeffi- 
cients are stored in coefficient memory .147. This data 
is recalled .for use by co-processor logic core 143 (1 
read). The output data is written into the second memory 
area 146 (1 write). Fallowing use of the data memory 
area 144, direct memory access circuit 120 writes the 
data for the next block overwriting the data previously 
used (2 write). At the same time, direct memory access 
circuit 120 reads data from memory area 146 ahead of 
it being overwritten by reconfigurable hardware co-proc- 
essor 140 (2 read). These two memory areas for input 
data and for resultant data could be configured as cir- 
, cular buffers. In a product that, requires plural related 
functions separate memory areas defined as circular 
buffers can be employed. One memory area configured 
as a circular buffer will be allocated to each separate 
function. 

[0020] The format of computattonal.commands pref- 
erably closely resembles the. format of a subroutine call 
instruction in a high level language. That is, the com- 
mand includes a command name similar In function to 
the subroutine name specifying the particular computa- 
tional function to be performed. Each command also in* 
eludes a set of parameters specifying options available 
within the command type. These parameters may take 
the form of direct quanthiesor variables, which are point- 
. ers to registers. or memory locations scoring the desired 
quantities. The number and type of these parameters 
depend upon the command type. This subroutine call 
lormat is important in reusing programs written for digital 
signal processor core. 110." Upon use Jr^, programmer 
or the compile t provides .a^ 

reconfigurable hardware coprocessor n 40 This stub 
subroutine merely receives the. subroutine parameters 

v and forms the. corresponding co-processor command 
using these parameters. The stub subroutine then 
writes.this command to the predetermined memory ad- 
dress reserved for command transfer to, reconfigurable 
hardware co-processor 140. and then returns. This in- 
vention envisions lhat the computational capacity of dig- 

. ital signal processor cores will increase regularly with 
time. Thus the processing requirements of a particular 
product may require the combination of digital signal 
processor core 11 O^and- reconfigurable hardware co- 
processor 140 at. one point in time. At a later point in 
time, the available computational capacity of an instruc- 
tion set compatible digital signal processor core may in- 
crease so that the f unctions previous requiring a recon- 
figurable hardware co-processor' may be performed in 
software by the digital signal processor core. The prior 
program code for the product may be easily converted 
to the new, more powerful digital signal processor. This 
is achieved by providing independent subroutines for 
each of the commands supported by the replaced 
. reconfigurable hardware co-processor. Then each 
place where the original program employs the subrou- 
tine stub to transmit a "command to the reconfigurable 
hardware co-processor is replaced by the correspond- 



ing subroutine call. Extensive reprogramming is thus 
avoided. 

[0021] Following completion of processing on one 
block of data, the data may be transferred out of data 

5 memory t 1 45 or un if ied memory 1 49. This second trans- 
fer can take place either by direct action of digital signal 
processor core 110 reading the data stored at the output 
memory locations or through the aid of direct memory 
access circuit 120. This output data may represent the 

io output of the process. In this event, the data is trans- 
ferred to a utilization device. Alternatively, the output da- 
ta of reconfigurable hardware co-processor 140 may 
represent work in progress. In this case, the data will 
typically be temporarily stored in memory external to in- 

*5 tegrated circuit 100 for later retrieval and further 
processing. 

[0022] Reconfigurable hardware co-processor 140 is 
then ready for further use. This further use may be ad- 
ditional processing of the same function. In this case, 

SO the process described above is repeated on a new block 
of data in the same way. This further use may be 
( processing of another function. In this case, the new da- 
ta must be loaded into memory accessible by reconfig- 
urable hardware co-processor. 140, the new command 

25 loaded and then the. processed data read for-output or 
further processing. 

[0023] Reconfigurable hardware , coprocessor 140 
preferably will be able to perform more than one function 
of the product algorithm. Many digital signal processing 

30 tasks will use plural instances of similar functions. For 
example, the process may include many similar filter 
functions. Reconfigurable hardware co-processor 140 
preferably has sufficient processing capability to per- 
form all these filter functions in real time. The advantage 
of operating on blocks of data rather than discrete sam- 
ples will be evident when reconfigurable hardware co- 
processor 1 40 operates in such a system. As an exam- 
ple, suppose, that reconfigurable hardware co-proces- 
sor 140 performs three functions, A, B and C. These 
- 40 Uinctions.may be sequential or they may be interleaved 
with functions performed by digital signal processor core 
.110. Reconfigurable hardware co-processor 140 first 
performs function A on a block of data This function is 
performed as outlined above. Digital signal processor 
core 110 either directly or by control of direct memory 
access circuit 1 20 loads the data into memory area 1 55 
of memory 149. Upon issue of the. command for config- 
uration for function A which specif ies the amount of data 
to be processes, reconfigurable hardware co-processor 

so 140 performs function A and stores the -resultant data 
in the a portion of memory area 155 specified by the 
command. A similar process occurs to, cause reconfig- 
urable hardware co-processor 140 to perform function 
B on data stored in memory area 157 and return the re- 

55 suit to memory area 157. The performance of function 
may take place upon data blocks having a size unrelated 
to the size of the data blocks for function. B. Finally, 
reconfigurable hardware co-processor 140 is com- 
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manded to perform function C on data within memory 
area 159, returning the resultant to memory area 159. 
The block size for perlorming function C is independent 
of the block sizes selected for functions A and B. 
[0024] The usefulness of the block processing is seen 
from this example. The three functions A, B and C will 
typically have data flow rates that are independent, that 
is not necessarily equal. Provision of special hardware 
for each function will sacrifice the generality of function- 
ality and reusability of reconfigurable hardware. Further, 
it would be difficult to match the resources granted to 
each function in hardware to provide a balance and the 
best utilization of the hardware. When reconfigurable 
hardware is used there is inevitably an overhead cost 
lor switching between configurations. Operating on a 
sample by sample basis tor flow through the three lunc- 
tions would require a maximum number of such recon- 
figuration switches. This would clearly be less than op- 
timal. Thus operating each lunction on a block of data 
before reconfiguration to switch between functions 
would reduce this overhead. Additionally, it would then 
be relatively easy to allocate resources between the 
f unctions by selecting the amount of time devoted to 
each function. Lastly; such block processing would gen- 
erally require less control overhead from the digital sig- 
nal processor core than switching between functions at 
a sample level. 

[0025] The block sizes selected for the various func- 
tions A, B and C will depend upon the relative data rates 
required and the data sizes. In addition, the tasks as- 
signed to digital signal processor core 110 and their re- 
spective computational requirements must also be con- 
sidered. Ideally, both digital signal processor core 110 
and reconfigurable hardware co-processor 140 would 
be nearly fully loaded. This would result in optimum use 
of the resources. Such balanced loading of digital signal 
processor core 110 and reconfigurable hardware co- 
processor 140 may only be achieved with product algo- 
rithms that can use reconfigurable hardware co-proces- 
sor 140 about 50% ol the computations. For the case in 
which reconfigurable hardware co-processor 140 can 
perform more that half of the minimum required compu- 
tations, the additional features implemented on digital 
* signal processor core 110 can be added to the product 
to" match the loading. This would result in use of spare 
computational resources In digital signal processor core 
110. The loading of computational processes may be 
statically determined. Such static computational alloca- 
" lion i can best be made when both digital signal prbces* 
sor core 110 and reconfigurable hardware co-processor 
140 perform fixed and known functions. If the computa- 
tional load is expected to change with time, then it will 
probably be best to dynamically allocate computational 
resources between digital signal processor core 110 
and reconfigurable hardware co-processor 140 are run 
time. It is anticipated that the processes performed by 
reconfigurable hardware co-processor 140 will remain 
relatively stable and only the processes performed by 



digital signal processor core 110 would vary, 
[0026] Figure 4 shows a memory management tech- 
nique that enables better interruption of operations. Da- 
ta 400 consisting of data blocks 401 , 402 and 403 pass- 

5 es the window 410 of a finite Impulse filter. Such filters 
operate on a time history of data. Three processes A, B 
and C operate in respective circular buffers 421, 431 
and 441 within data memory 145. Such a circular buffer 
enables the history to be preserved. Thus when 

io processing the next block following other processing, 
the history data is available at predictable addresses for 
use. This history data is just before the newly written 
data for the next block. 

[0027] This technique works well except if memory 
is : space needs to be cleared to permit another task. In that 
event, the history data could be flushed and reloaded 
upon resumption of the filter processing. Alternatively, 
the history data needed lor the next block could be 
moved to another area of memory 1 45 or to ah external 
20 memory attached td external memory interf ace 1 30. Ei- 
ther of these methods is disadvantageous because they 
require time to move data. This either delays servicing 
the interrupt or resuming the original task. 
[0028] A preferred alternative is illustrated schemati- 
cs caily in Figure 4. During the writing of the resultant data 
to its place in mefnory, the current sample is written to 
a smaller area of memory. For example, input data from 
circular buftor 421 is written into htetory buffer 423. input 
data from i circular buffer 431 is written into history buffer 
30 433. and input data from circular buffer 441 is written 
into histp^" buffer 443. Each of the history buffers 423. 
" ' 433 arid j*43 afe i jOst the size heeded to store the history 
according 'td the Width of the correspe>hdirg filter wind ow 
" such as 1 filter window 41 6. Upon completion of process- 
as ing of a block '6f*data, thembsl recent history is stored 
in this restricted area.'lf the co-processor must be inter- 
rupted the data within the circular buffers 421 , 431 and 
441 may be cleared without erasing the history data 
stored in history buffers 423. 4$3 arid 443. This tech- 
no ntque' spares the need for reloading ]the data or storing 
' the data else where prior to beginning the interrupt task. 
In many filter tasks enough write memory bandwidth will 
be available to achieve writing to the history buffers with- 
out requiring extra cycles. Ariother advantage 61 this 
4£ ' technique is that less memory need be allocated to cir- 
cular buffers 421, 431 and 441 than previously. In the 
previous technique, the circular buffers must be large 
enough to include an entirje block of data and an addi- 
r tional r amount equal to the required history data. The 
so technique illustrated in Figure 4 enables the size ol the 
circular buffers 421, 431 and '441 to be reduced to just 
enough to store one block of data. . . 
[0029] Many algorithms useful in audio and video sig- 
nal processing involve adapting coefficients. That is, 
55 there is some feedback path that changes the function 
performed overtime. An example of such a algorithms 
is a modem that requires a time to adapt to the particular 
line employed and the operation 61 the far end modem 
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Initially it would . se.em c that performing such adaptive 
functions in block mode would adversely effect the con- 
vergence of these adaptive functions. Review ol the 
mathematics involved jn many such functions shows 
otherwise. The amount of adaption that can be per- 
formed at a particular time generally "depends upon the 
amount of data available for computing the adaption. 
This, amount ol available data .does not depend upon 
whether the data is processed sample by sample oMn 
blocks of samples. In practice the rate of adaption will 
t be about the same. Adaption on a sample by sample 
basis would result in convergence toward the fully 
adapted coefficients in many small steps. Adaption 
. based upon blocks of data would result in convergence 
. in fewer and larger steps. This is because the greater, 
amount of data available would drive a larger error term 
for correction in the block processing case. However, 
the average convergence slope would be the same for 
the two cases. In cases where most of the adaption - 
takes place upon initialization and most of the process- 
ing takes place under steady state conditions, such as 
the previous modem example, there would be little prac- 
. tical difference. In cases where the adaptive fitter must 
'follow a moving target, it is. not clear whether adaption 
on a sample by sample basis is better than adaption of 
a block basis. If, for example, the process followed var- 
ies at a frequency greater than .the inverse of the time 
of the block. size, then adaption on a block basjs may 
prevent useless hunting in small steps as compared with 
sample by sample adaption . Thus adaptive filtering on 
, a block basis has no general, disadvantage oyer, adap- 
. tive filtering oh a sampfe by sample Bas'isV , [ T 
[0030] The command set ot.recpnfig'urabfe hardware 
co-processor 140 preferably includes several non-com- 
\ putational instructions Jof control functions. These con- 
trol functions will be useful in cooperation between dig- 
ital signal processor core 110 and reconfigurable hard- 
ware co-processor 140, The first of these non-compu- 
tational commands is a receive data synchronization 
command. This command will typicalfy be "used in con- 
junction with data transfers handled by direct memory 
access circuit 120! Digital signal processor core 110 will 
control the process by setting up the input data transfer 
through direct memory access circuit 120. Digital signal 
processors core 110 will send two commands to recon- 
figurable hardware coprocessor 140. The first com- 
mand is the receive data synchronization command. 
The second command is the computational command ^ 
desired. "** . 

[0031] Reconfigurable hardware co-processor 140 
operates on commands stored in command memory 
141 on a first-in-first-out basis. Upon reaching the re- 
ceive data synchronization command reconfigurable 
hardware co-processor will stop: Reconfigurable hard- 
ware co-processor will remain tdle until rt recoives a con- 
trol' signal from direct memory access circuit 120 indi- 
cating completion of the input data transfer. Note that 
upon such completion of this input data transfer, the data 
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for the next block is stored in data memory 145 or unified 
memory 14g. Direct memory access circuit 120 may be 
able to handle plural queued data transfers. This is 
known in the art as plural DMA channels. In this case 

5 the receive data synchronization command must note 
the corresponding DMA channel, which would be known 
to digital signal processor core 110 before transmission 
of the receive data synchronization command. Direct 
. memory access circuit J 20 would transmit the channel 

io number of each completed data transfer. This would 
permit reconfigurable hardware co-processor 140 to 
match the completed direct memory access with the cor- 
responding receive data synchronization command. 
Reconfigurable hardware co-processor would continue 

is to the next command only if a completed direct memory 
access signal indicated the same DMAchannelas spec- 
ified in the receive data synchronization command. 
[0032] Following jhis completion signal, reconfigura- 
ble hardware co-processor 140 advances to the next 

so command in command memory 141. In this case this 
next command js^a computaftional command using the 
data just loaded. Since this, computational command 
cannot start until the previous receive data synchroni- 
zation command completes, this. assures that the cor- 

25 r ect data has been loaded. 

[0033] This combination of the receive data synchro- 
nization command and the computational command re- 
duces the control burden on digital signal processor 
core 110. Digital signal processor core 110 need only 

30 set up direct memory access circuit 120 to make the in- 
put data transfer and send the pair of commands to 
reconfigurable hardware co-processor 140. This would 
assure that the input data transfer had completed prior 
to beginning the computational operation. This greatly 

35 reduces the amount of software overhead required by 
the digital signal processor core 11 0 to control the func- 
tion of reconfigurable hardware co-processor 140. Oth- 
erwise, digital signal processor core may need to re- 
ceive an interrupt from direct memory access circuit 1 20 

40 signaling the completion of the input data load opera- 
tion. An interrupt service routine must be written to serv- 
/ . ice the interrupt. In addition, such an interrupt would re- 
quire a context switch to sent the co-processor com- 
mand to command memory and'another context switch 

45 to return from the interrupt. Consequently, the receive 
, data synchronization command frees considerable ca- 
pacity within 'digital signal processor core for more pro- 

- dUCtive USe. - . ■ — ^-r^,,^-- 

"[0034] Another non-computational command is a 
50 send data synchronization command. The send data 
synchronization command is nearly the inverse of the 
receive data synchronization command.'Upon reaching 
the send data synchronization command, reconfigura- 
blB hardware co-processor 1 40 triggers a direct memory 
55 access operation. This direct memory access operation 
reads data from data memory 145 or unified memory 
149 for storage at another system location. This direct 
memory access operation may be preset by digital sig- 
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fiat processor core 110 and is merely begun upon re- 
ceipt of a signal from reconfigurable hardware co-proc- 
essor 140 upon encountering the send data synchroni- 
zation command. In the case in which direct memory 
access circuit 120 supports plural DMA channels, the 
send data synchronization command must specify the 
DMA channel triggered. Alternatively, the send data 
synchronization command may specify the control pa- 
rameters for direct memory access circuit 120, including 
the DMA channel if more than one channel is supported. 
Upon encountering such a send data synchronization 
command, reconfigurable hardware co-processor com- 
municates directly with direct memory access circuit 1 20 
to set up and start an appropriate direct memory access 
operation. 

10035] Another possible non-computational com- 
mand is a synchronization completion command. Upon 
encountering a synchronization completion command, 
reconfigurable hardware coprocessor 1 40 sends anjn- 
terrupt to digital signal processor core 11 0. Upon receiv- 

- ' ing such an interrupt, digital signal processor core 110 
is assured that all prior commands sent to reconfigura- 
ble hardware coprocessor '140 have completed. De- 
pending upon the application, it may be better to control 
via interrupts than through send and receive data syn- 
chronization conirnands. It may also be better to queue 
several operations for reconfigurable hardware co-proc- 
essor 140 using send and receive data synchronization 
commands, and then interrupt digital signal processor 
core 1 1 0 at the end of the queue. This may be useful for 
*' higher level control f unctionsby digital signal processor 
* core 110 following the queued operations by reconfig- 
urable hardware co-processor. 
[0036] Figure 5 illustrates another possible arrange- 
ment of circuit 100. Circuit 100 illustrated in Figure 5 in- 
cludes two reconfigurable hardware co-processors. 
Digital signal processor core. 110 operates with first 
reconfigurable hardware co-processor 1 40 and second 
reconfigurable hardware co-processor 1B0. A private 
bus 1B5 couples first reconfigurable hardware co-proc- 

* essor 140 and second' reconfigurable hardware co- 
processor 180. These co-processors have private mem- 
ories sharing the memory space ol digital signal proc- 
essor core 1 1 0. The data can be transferred via private 
bus 185 by one co-processor writing* to the address 
range encompassed by th e other co-processor's private 
memory. Alternatively, each co-processor may have an 
output port directed toward an input port ol the other co- 
^ processor with the links between co-processors encom- 
passed in private bus 1 85. This construction may be par- 
ticularly useful for products in which data flows from one 
type operation handled by one co-processor to another 
type operation handled by the second co-processor. 
This private bus frees digital signal processor core 110 
from having to handle the data handofl either directly or 
via direct memory access circuit 120. 
' [0037] Figures 6 to 9 illustrate the construction ol an 
exemplary reconfigurable hardware co-processor This 



particular co-processor is called a multiple -multiply-ac- 
cumulator. The nrVultiply-accurriulate operation' where 
the sum of plural products is formed is widely used in 
signal processing. Many filter algorithms are built 

s around these functions! 

[0038] Figure 6 illustrates the overall general archi- 
tecture of multiple multiply-accumulator 140. Data mem- 
ory 145 and coefficient, memory 147 may be written to 
in 128 bit words. This write bperation is controlled by 

io digital signal processor core 110 or direct memory ac- 
cess circuit 120. Address generator 150 generates the 
addresses for recall of data and coefficients used by the 
co-processor. This read operation operates on data 
words of 128 bits from each memory. 

is [0039] These recalled data words are supplied to in- 
put formatter 1 60. Input formatter 160 performs various 
shift and alignment operations generally to arrange the 
128 bit input data words into the order heededfor the 
desired computation. Input formatter outputs a'l28 bit 

20 (8 by 16 bits) Data X, a 1 28 bit (8 by 1 6 bits) Data Y and 
a 1 64 bit (2 by 32 bits) Data 2. 

[0040]' These three data streams are supplied to da- 
tapath 170. Datapath 170 is the operational 'portion of 
\he co-processor. As will be further described below, da- 

2S tapath 170 includes plural hardware multipliers and 
adders that are correctable in various ways to perform 
a variety of multipry-accumulate operations. Datapath 
M0 outputs two adder data streams. Each of these is 4 
32 bit data words. 

30 [0041] "These two data streams supply the inputs to 
output formatter 180.. Output formatter 1 80 rearranges 
1 the^two data* streams into two 1 28 bit data word for writ- 
/ ingback into the.lwo memories. The addresses for these 
write operations are cornputod by address generator 

35 '150. This rearrangement may take care of alignment on 
memory word boundaries. . 
[0042] The operations of co-processor 140 are under 
control of control unit 1 90. Control unit l 90 recalled the 
commands from command memory .141 and provides 

40 the corresponding control within co-processor 1 40. 
[0043] The construction, of input formatter 160 is illus- 
trated in Figure 7. Each of the two data streams of 128 
bits are supplied to an input of multiplexers 205 and 207. 
Each multiplexer independently selects one input tor 

4S storage in its corresponding register. 21 5 and 217. Mul- 
tiplexer 205 may select to recycle the contents of regis- 
ter 215 as well and either; data stream. Multiplexer 207 
may only select one of. the input data streams. Multiplex- 
ers 20 V and ^3 may select the contents of register 215 

so or may select recycling ol the contents of their respec- 
tive registers 211 and 21 3.. Multiplexer 129 selects the 
contents of either register 211 or 21 3 for supply to the 
upper "bits of shifter 221.. The lower bits are supplied from 
register 215. Shifter 221 shifts and selects only 1 28 bits 

55 of its 256 input bits. These, 128 bits are supplied to du- 
plicate/swap unit 223. Duplicate/swap unit 223 may du- 
plication a portion of hs ipput into the full 1 28. bits or it 
may rearrange the data order. Thus sorted, the data is 



8 



BNSDOCID: <EP 0945788A2J_> 



15 



EP 0 945 788 A2 



16 



temporarily stored in register 225. This forms the Data 
X input lo datapath 170. The output of multiplexer 207 
is supplied .directly to multiplexer. 233 and well as sup- 
plied via register 217. Multiplexer 233 selects 192 bits 
from the bits supplied to it. The upper 128 bits form the 
Data Y input to datapath 170., These bits may be recir- 
culated via multiplexer 235. The lower 64 bits forms the 
Data Z input to datapath 170. 

[0044] Figure 8 illustrates in block diagram form the 
construction of datapath 170. Various segments of the 
Data X and the Data Y inputs supplied from input for- 
matter are supplied to dual multiply adders 310, 320, 
330'and 340. As shown, the first and second 16 bit data 
words Data X[0:1] and Data Y[0:1] are coupled to dual 
multiply adder 31 0, the third and fourth 1 6 bit data words 
Data X[2:3] and Data Y(2:3] are coupled to dual multiply 
adder 320, the fifth and sixth" 16 bit data words Data X 

. [4:5] and Data Y[4:5] are coupled to dual multiply adder 
330 and the seventh and eighth 16 bit data words Data 
X[6:7] and Data Y[5:7J are coupled to dual multiply 
adder 340. Each of these' units is identical, .only dual 
multiply adder 310 will be described in detail. The least 
significant 16 data X and Data Y bits supply inputs to 
multiplier 311. Multiplier 311 receives the pair of 16 bit 
inputs and produces a 32 bit product.' This product is 
stored in a pair of pipeline output registers. The 32 bit 
output is supplied to both sign extend unit 313 and an 8 
bit left shifter 31 4. Sign extend unit 31 3 repeats the sign 
bit of the product, which is the most significant bit, to 40 
bits. The 8 bit left shifter 314 left sh ifts the 32 bit product 
and zero fills the vacated [past significant bits. One of 
these two 40 bit quantities is^&le^^fn^niultiplexer 316 
for application to a tirshnpirtof 40 brt addef 319. In a 
similar fashion, the next most sjgriificanf 16 Data X and 
Data Y bits are supplied to respective inputs of multiplier 
312. Multiplier 312 receives the two 16 bit inputs and 
produces a 32 bit product. The product is stored in a pair 
of pipeline registers!'The 8 bit right shifter 31 5 right shifts 

' the product by 8 bits and zero fills the vacated most sig- 
nificant bits. Multiplexer 317 selects from among three 
quantities. The first quantity is a concatenation of the 1 6 
Data'X bits arid the 16 Data Y bits at the input. This input 
allows multiplier 31 2 to be bypassed. It selected the 32 
bits (as sigh extended^ by sign extender 318) are added 
to the product produced by multiplier 311. The second 
quantity, is the product suppljed by multiplier 312. The 
third quantity Is the shifted output of 6 bit right shifter 

,315. The selected quantity frbrn multiplexer 31 7 is sign 
extended to 40 bits by sign extend unit 318. The sign 
extended 40 bit quantity is the second input to 40 bit 
adder 319. Adder 319 is provided with 40 bits even 
though the 16 bit input factors would produce only 32 
bits to provide dynamic range for plural'multiply accu- 
mulates: 

[0045] The output of the adders 31 9 within each of the 
dual multiplier adder units 310, 320, 330 and 340 are 
provided as the first adder stage output adder_st1_outp. 
Only the 32 most significant adder output bits is con- 



nected to the output. This provides a 4 by 32 bit our 1 28 
bit output. 

[0046] A second stage of 40 bit adders includes 
adders 353 and 355. Adder 353 adds the outputs of dual 

5 multiply adder units 310 and 320. Adder 355 adds the 
outputs of dual multiplier adder units 330 and 340. Two 
other data paths join within the second adder stage. The 
least significant 32 bits of the Data Z input is temporarily 
stored in pipeline register 351 . This 32 bit quantity is sign 

10 extended to 40 bits in sign extend unit 352. In a similar 
fashion, the most significant bits of the Data Z input is 
temporarily stored in pipeline register 357. This quantity 
is sign extended to 40 bits by sign extend unit 358. 
[0047] The third adder stage includes adders 361 , 

is 363. 367 and 36B. Adder 361 is 40 bits wide. It adds the 
output of adder 353 and the sign extended least signif- 
icant Data Z bits. The 32 most significant bits of this sum 
are supplied as pan of the third stage output 
adder_st3_outp. Similarly, adder 363 is 40 bits wide and 

20 adds the output of adder 355 and the sign extended 
most significant pata Z bits! The 32 most significant bits 
of this sum are supplied as part of the third stage output 
adder_st3_outp. The connections' to adders 367 and 
368 are much more complicated. The first input to adder 

25 367 is either the output of adder 353 of the second stage 
or a recirculated output as selected by multiplexer 364. 
Multiplexer 371 selects from among 8 pipeline registers 
for the recirculation quantity. The second input to adder 
367 is. selected by multiplexer 365. This is either the 

so least significant Data Z input as sign extended by sign 
V extend unit 353, the direct output of adder 368, the out- 
put of adder 355 or a fixed rounding quantity rndLadd. 
Addition of the fixed rounding quantity rnd_add causes 
the adder to round the quantity at the other input. The 

35 output of adder 367 supplies the input to variable right 
shifter 375. Variable right shifter 375 right shifts the sum 
a selected amount of 0 to 1 5 bits. The 32 most significant 
bits of its output forms a part of the third, stage output 
adder_st3_outp. The first input to adder 368 is the out- 
put of adder 355. The second input to adder 368 is se- 
lected by multiplexer 366. Multiplexer 366 selects either 
the output of adder 353, the most significant Data Z input 
as sign extended by sign extend unit 358, the recircula- 
tion input or the fixed rounding quantity rnd_add. Mufti- 

45 ptexer 373 selects the recirculation quantity from among 
8 pipeline registers at the output of adder 368. The out- 
put of adder 368 supplies the input to variable right shift- 
er 377- Variable right shifter 377 right shifts the sum a 
selected amount of 0 to 1 5 bits. The 32 most significant 

50 bits of its output forms another part of the third stage 
output adder_st3_outp. 

[0048] Figure 9 illustrates the construction of the out- 
put formatter illustrated in Figure 6. 
[0049] Figures 10 to 13 illustrate several ways that 
55 multiple multiply accumulate co-processor 160 may be 
configure. The data flow in each of these examples can 
be achieved by proper selection of the multiplexers with- 
in datapath 170. The following description will note the 
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- , - -^^-^-^p^^^j^ — -|^^ le ^- ^ gjaiectibhs when they are rel- 
evant to achieving the desired data flow. 
[0050] Figure 10 illustrates the data flow in a real finite 
impulse fitter (FIR). Data DO to D7 and coefficients CO 
to C7 are supplied to respective multipliers 311, 312. 
321, 322, 331, 3332, 341 and 342. In this case multi- 
plexers corresponding to multiplexer 317 in dual multiply 
adder unit 310 each select the product of the respective 
multipliers 31 2, 322, 332 and 342, Pairs of products are 
summed in adders 31 9. 329, 339 and 349. Pairs of these 
sums are further summed in adders 353 and 355. The 
sums formed by adders 353 and 355 are adder in adder 
368. In this case, multiplexer 366 selects the sum pro- 
duced by adder 353 for the second input to adder 366. 
Adder 367 does the accumulate operation. Multiplexer 
364 selects the output of multiplexer 371, selecting a 
pipeline register tor recirculation, as the first input to 
adder 363. Multiplexer 365 selects the output of adder 
368 and the second input to adder 363. Adder 357 pro- 
duces the filter output. Note that this data flow produces 
the sum of the 8 products formed with the prior summed 
products. This operation is generally known as multiply 
accumulate and is widely us'ed in filter functions. Con- 
figuration of datapath 170 as illustrated in Figure 7 per- 
* rnrts computation of the accumulated sum of 8 products. 
This greatly increased the throughput in this data flow 
over the typical single product accumulation provided 
by digital signal processor core 110. 
[0051] Figure 1 1 illustrates the data flow of a complex 
FIR'filter. This data flow is simitar to that of the real FIR 
filter illustrated in Figure 7. The data flow of Figure 8 
simultaneously operates on the real and imaginary parts 
of the computation. Data and coefficients are supplied 
to respective multipliers 311, 312, 321, 322, 331, 3332, 
341 and 342. Murtiptexers corresponding to multiplexer 
317 in dual multiply adder unit 31 0 each select the prod- 
uct of the respective multipliers 312, 322, 332 and 342, 
Pairs of products are summed in adders 319, 329, 339 
and 349. Pairs of these sums are further summed in 
adders 353 and 355. The real and complex parts are 
separately handled by adders 367 and 36B. Multiplexer 

365 selects the sum of adder 353 for the second input 
to adder 367. Multiplexer 364 selects the output of mul- 
tiplexer 371, selecting a pipeline register for recircula- 
tion, as the first input to adder 363. Adder 368 receives 
the sum of adder 355 as its first input. Multiplexer 366 
selects the recirculation output of multiplexer 373 for the 
second input to adder 368. The pair of adders 367 and 

366 &s ^^uce the real and imaginary parts of the 
multiply accumulate operation. 

[0052] Figure 12 illustrates the data flow in a coeffi- 
cient update operation. The error terms EO to E3 are 
multiplied by the corresponding weighting terms WO to 
W3 in multipliers 311. 321, 331 and 341. The current 
coefficients to be updated CO to C3 are input directly to 
adders 31 9, 329, 339 and 349 as selected by multiplex- 
ers 31 7, 327, 337 and 347. The respective products are 
added to the current values in adders 31 9. 329, 339 and 



349. In this case the output is produced by adders 319, 
329, 339 and 349 via the adder stage 1 output 
adder_st1_outp. 

[0053] Figure 1 3 illustrates the data flow in a fast Fou- 
s rier transform (FFT) operation. The FFT operation starts 
with a 16 bit by 32 bit multiply operation. This is achieved 
as follows. Each duaf multiply adder 310, 320, 330 and 
340 receives a respective 1 6 bit Quantity AO to A3 at one 
input of each of the paired multipliers 311 and 312, 321 
io and 322, 331 and 332, and 341 and 341 Multipliers 311 , 
321, 331 and 341 receive the 16 most significant bits of 
the 32 bit quantity BOH tb B3H. Multipliers 31 2, 322, 332 
and 342 receive the 16 least significant bits of the 32 bit 
quantity BOLto B3L. Shifters 31 4, 315, 324, 325, 334, 
15 335, 344 and 345 are used to align the products. Multi- 
plexers 316, 326, 336 and 346 select the left shitted 
quantity from respective 6 bit left shifters 314, 324, 334 
and 344 for the first input into respective adders 319, 
329, 339 and 349. Multiplexers 317, 327, 337 and 347 
20 select the right shifted quantity from respective 8 bit right 
shifters 315, 325, 335 and 345 as the second inputs to 
respective adders 319, 329, 339 and 349. These two 
oppositely directed 8 bits shifts provide an effective 16 
bit shift for aligning the partial products for a 16 bit by 
2$ 32 bit multipjy. Pairs of these sums are further summed 
in adders 353 and 355. Adder 361 adds the Data Z0 
input with the output from adder 353. Multiplexer 364 
selects' the"sum of adder 353 as the first input to adder 
367. Multiplexer '365 selects that Data Z0 input as the 
30 second input to adder 367. Adder 368 receives the sum 
of adder 355 as its first input. Multiplexer ^selects 
the [ Data Z1 input as' the second input to adder 366. 
* : h 'fi^/£te[a^l$& sun of adder 355 and the Data 21 
input" The output of the FFT operation is provided by the 
35 sum outputs of adderS 361 . 367, 368 and 3jE>3. . 

[0054] The list below is a partial list of some of the 
commands that may be performed 6y the data path 170 
of multiple multiply accumulate "unit 140 J illustrated in 
Figures 3 to 6. . 
40 „ . . - 

vector_add_i6b(len._pdata, pcoeff, pout) 
vector_addL32b(len;pdata, pcoeff, pout) 
vector_mpy_16b(len, pdata, pcoeff, pout) . 
vecfor„mpy„1632b(len, pdata, pcoeff, pout) . 
45 vector_rripy_32b(le'n, pdata, pcoeff, pout) 

scalar_vector_adci_i6b'(ren, pdata, pcoeff, pout) 
scalar_vector_add_32b(len. pdata, pcoeff. pout) 
scalar_vector_mpy_l6b(leri, pdata, pcoeff, pout) 
scalar^vectoCmpy^iesfebOen, pdata, pcoeff. pout) 
so scaiar_vector„mpy_32b(len, pdata, pcoeff, pout) 

For these operations, the operation name indicates the 
data size. The "len" parameter field indicates the length 
of the function. The "pdata" parameter field is a pointer 
ss to the beginning memory address containing the input 
data. The •pcoeff" parameter field is a pointer to the be- 
ginning memory address containing the coefficients for 
the filter. The "pout" pararneter field is a pointer to the 
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beginning memory address to receive the output. As 
previously described, these pointer preferably point to 
respective locations within data memory 145 and coef- 
ficient memory 1 47 or unified memory 1 49, . 

FFT_real (ffi_s.izef pdata , .pcoeff, pout) 
FFT_compiex(ffT size, pdata, pcoefl, pout) 

The last Fourier .transform operations preferably all in- 
clude 32 bit data and 16 bit coefficients as previously 
described in conjunction with Figure 10. The frt__size pa- 
rameter field defines the size of the function. The other 
listed parameter fields are as described above. 

FIR_real(us, ds, len, blocksize, pdata, pcoeff, pout) 
FIR_complex_real(us, ds, len. blocksize, pdata. 
pcoeff, pout) 

FIR_complex_real_sum(us,ds, len, blocksize, pda- 
. ta, pcoeff, pout) 
FIR_cbmplex(us, ds, len, blocksize, pdata, pcoeff, 
pout) 

The finite impulse response fi It e reoperations differ in the 
type of the data and coefficients. The FjFOeal operation 
employs ' real data "and real coefficients. The 
FIR_complex_r©al operation employe complex data 
and real coefficients. The FIR_complex_real_sum op- 
eration separately sums the complex arid." real parts em- 
ploying complex .data and real coefficients. The 
FIR_complex operation employs both complex data and 
complex coefficients/ The us parameter fie jcf indicates 
the upsampling ratio! The ds r j^rameter. field .indicates 
the down sampling ratio! The jDlc^ksizVpata field 
indicates that size of the operational blocks employed. 
The other parameter fields are as previously described. 
[0055] The parameters of all these commands could 
be either immediate values or, for. the cteta, coefficient 
and output locations, 1 6 bit address pointers into the co- 
processor memory. This selection would mean that the 
finite impulse filter commands, which are the longest, 
would require about five 16 bit command words. This 
would be an insignificant amount of bus traffic. Alterna- 
tively, the parameter fields could be indirect, that is iden- 
tify a register from a limited set of registers for each pa- 
rameter. There could be a set of 8 registers for each pa- 
rameter, requiring only 3 bits each within the command 
word. Since only a limited number of particular fitter set- 
tings' would be required, this is ; feasible. ...•.>»,.-. 



Claims 

1. " A data processing 'system disposed on a single in- 
tegrated circui.t comprising: 

a digital signal processor core connected to a 
data bus and an address bus, said digital signal 
processing core operable for generating co- 



processor commands; . 
a co-processor connected to -said data bus, 
said address bus and said digital signal 
processing core, said co-processor having a lo- 
cal memory within the address space of said 
digital signal processor core and responsive to 
commands generated by said digital signal 
processor core to perform predetermined data 
processing operations on data stored in said lo- 
cal memory in parallel to said digital signal 
processor core. 

2. The data processing system of claim 1 , further com- 
prising: 

a direct memory access circuit under the con- 
trol ot said digital signal processor and .capable of 
autonomously transferring data between prede- 
fined addresses in memory including transferring 
data to and from said local memory of said co-proc- 
essor 

3. The data processing .system of claim 2, wherein: 

said coprocessor is responsive for receiving 
a data synchronism command for pausing process- 
ing commands until, said direct memory access cir- 
cuit signals completion of a predetermined memory 
transfer of data into said local memory. . 

4. The data processing system of claim 2, wherein: 

. . said co-processor is responsive to a send da- 
ta synchronism command for signalling said direct 
memory access circuit to trigger a predetermined 
memory transfer of data out of said local memory. 

5. ( The data processing system of any preceding, 

claim, wherein: 

said co-processor further includes a com- 
mand first-in-first out memory having a input re- 
sponsive to data written to a predetermined mem- 
ory address and an output for controlling operation 
of said coprocessor. 

6. The data processing system of any preceding 
claim, wherein said co-processor is responsive to 
said commands for configuring itself correspond- 
. tngly whereby said co-processor is operable to per- 
form a set of related data processing operations. 

7. The data processing system of any preceding 
claim, wherein said co-processor is responsive to 
an interrupt command for transmiting ah interrupt 
signal to said digital signal processor core. 



0. The data processing system of any preceding 
claim, wherein each command includes an indica- 
tion of a data input location within said! local mem- 
ory; and 

said co-processor is responsive to said com- 



10 



1$ 



20 



2S 



30 



35 



40 



45 



BNSOOCID: <EP 094S78aA2 1 > 



11 



21 EP 0 945 788 A2 22 



mands to recall data from said local memory start- 
ing with said indicated data input location. 

The data processing system ot any preceding 
claim, wherein each command includes an indica- 5 
tion of a data output location within said local mem- 
ory; and 

said co-processor is responsive to said com- 
mands tor storing resultant data from a data - 
processing operation corresponding to said com- 10 
mand in local memory starting with said indicated 
data input location. 



10. A method of data processing comprising the steps 
of: 
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providing a local memory within a co-processor 
having addresses within a memory map of a 
digital signal processor core; 
transferring data to said local memory; 20 
transmitting a command to said co-processor 
thereby causing said co-processor to perform 
a corresponding data processing operation in 
parallel to' said digital signal processor core and 
store results in said local memory; and & 
transferring said results o'lrtof said local mem* 
, ory ol said co-processor^ ; 

11. The method of claim 10 wherein: 

said step of transferring data to said local 30 
memory comprises storing data in a next location in 
a circularly organized memory area serving as an 
input buffer 

12. The method of claim 10 or claim 11 wherein: 35 

said step of storing results In said local mem- 
ory comprises storing data in a next location in a 
circularly organized memory area serving as an out- 
put buffer. 
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13. The method of claim 12, further comprising: . 

storing input data within a circularly organized 
history buffer having a size corresponding to a time 
extent of said corresponding data processing oper- 
ation substantially concurrently with said step of 
storing results in said local memory. 
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